Machine Learning Models to Predict the Risk of Rapidly Progressive Kidney Disease and the Need for Nephrology Referral in Adult Patients with Type 2 Diabetes

Early detection of rapidly progressive kidney disease is key to improving the renal outcome and reducing complications in adult patients with type 2 diabetes mellitus (T2DM). We aimed to construct a 6-month machine learning (ML) predictive model for the risk of rapidly progressive kidney disease and the need for nephrology referral in adult patients with T2DM and an initial estimated glomerular filtration rate (eGFR) ≥ 60 mL/min/1.73 m2. We extracted patients and medical features from the electronic medical records (EMR), and the cohort was divided into a training/validation and testing data set to develop and validate the models on the basis of three algorithms: logistic regression (LR), random forest (RF), and extreme gradient boosting (XGBoost). We also applied an ensemble approach using soft voting classifier to classify the referral group. We used the area under the receiver operating characteristic curve (AUROC), precision, recall, and accuracy as the metrics to evaluate the performance. Shapley additive explanations (SHAP) values were used to evaluate the feature importance. The XGB model had higher accuracy and relatively higher precision in the referral group as compared with the LR and RF models, but LR and RF models had higher recall in the referral group. In general, the ensemble voting classifier had relatively higher accuracy, higher AUROC, and higher recall in the referral group as compared with the other three models. In addition, we found a more specific definition of the target improved the model performance in our study. In conclusion, we built a 6-month ML predictive model for the risk of rapidly progressive kidney disease. Early detection and then nephrology referral may facilitate appropriate management.


Introduction
Diabetes mellitus (DM) is a major cause of life expectancy reduction and premature death [1][2][3]. The mortality in diabetic patients is significantly increased when the renal function is impaired [4]. With the improvement in treatment, trends in the rates of some diabetic complications have decreased, such as stroke or acute myocardial infarction, although the burden of diabetes is continuously increasing. However, diabetic kidney disease is still the leading cause of end-stage kidney disease (ESKD) [5,6]. According to the Int. J. Environ. Res. Public Health 2023, 20, 3396 2 of 16 report by the United States Renal Data System (USRDS) in 2020 [7], Taiwan has persistently reported the highest incidence and prevalence of end-stage kidney disease worldwide. In the 2020 annual report on kidney disease in Taiwan, Lai et al. [8] reported the percentage of diabetes among incident dialysis patients increased from 45.3% in 2010 to 46.2% in 2018, and the percentage of diabetes among prevalent dialysis patients increased from 39.7% in 2010 to 47.8% in 2018. Early detection of rapidly progressive kidney disease and nephrology referral is an important point to decrease complications and mortality [9]. Albuminuria is an important marker of diabetic kidney disease (DKD) and is associated with a poor outcome, but some type 2 diabetes mellitus (T2DM) patients have a GFR decline before the onset of albuminuria [10]. In addition, nondiabetic kidney diseases are also the possible cause of rapidly progressive kidney disease [11], and these patients should be promptly referred to an experienced nephrologist for further surveying and management. With the heterogeneous phenotype of type 2 diabetic renal disease, novel tools are required for the early detection of rapidly progressive kidney disease and the need for nephrology referral in T2DM patients.
Artificial intelligence (AI) has been widely applied in medical fields for diagnostic assistance, outcome prediction, and guiding treatment. Machine learning (ML) is a subset of AI. ML models are algorithms that teach a computer to learn from data [12,13]. There have been some studies of AI applications in DKD [14][15][16][17][18], but only a few studies have focused on the prediction of diabetic nephropathy and renal function decline [15,17]. The aim of our study was to construct a 6-month ML predictive model for the risk of rapidly progressive kidney disease and the need for nephrology referral in adult patients with T2DM.

Study Subjects
We retrospectively extracted the electronic medical records (EMR) in our hospital from January 2008 to June 2021. Among them, we found 62,360 patients with a diagnosis of type 2 diabetes mellitus (T2DM) according to the International Classification of Diseases codes, and the inclusion criteria of our study were as follows: (1) hospitalized at least once with ICD-9 or ICD10 coding for T2DM, (2) at least two outpatient ICD-9 or ICD10 codings for T2DM, and (3) age at diagnosis of T2DM ≥ 20 years. The exclusion criteria were (1) patients who underwent dialysis before the diagnosis of T2DM and (2) renal transplant patients. Our study was approved by the institutional review board of Taichung Veterans General Hospital (IRB TCVGH No: SE22064A). Patient informed consent was waived because all protected health information was deidentified and the retrospective data analysis nature of this study. This research was funded by grants from the Ministry of Science and Technology of Taiwan (MOST 108-2314-B-005-005-MY3).

Study Design and Label Definition
In this study, we aimed to construct multiple machine learning models to predict the risk of rapidly progressive kidney disease and the need for nephrology referral in diabetes patients. We compared two different prediction outcomes of renal function deterioration and the need for nephrology referral in diabetes patients ( Figure 1): (1) the estimated glomerular filtration rate (eGFR) falling below 30 mL/min/1.73 m 2 and (2) the eGFR falling below 45 mL/min/1.73 m 2 . Clinical guidelines [19,20] recommend the referral of DM patients to nephrology when the eGFR falls below 30 mL/min/1.73 m 2 . However, a previous study showed a GFR < 45 mL/min/1.73 m 2 at the time of referral is also a significant risk factor for mortality [21]. Hence, the outcomes of our predictive models were aggravated renal function from eGFR ≥ 60 mL/min/1.73 m 2 to (1) GFR < 30 mL/min/1.73 m 2 and (2) to <45 mL/min/1.73 m 2 .

Study Design and Label Definition
In this study, we aimed to construct multiple machine learning models to predict the risk of rapidly progressive kidney disease and the need for nephrology referral in diabetes patients. We compared two different prediction outcomes of renal function deterioration and the need for nephrology referral in diabetes patients ( Figure 1): (1) the estimated glomerular filtration rate (eGFR) falling below 30 mL/min/1.73 m 2 and (2) the eGFR falling below 45 mL/min/1.73 m 2 . Clinical guidelines [19,20] recommend the referral of DM patients to nephrology when the eGFR falls below 30 mL/min/1.73 m 2 . However, a previous study showed a GFR < 45 mL/min/1.73 m 2 at the time of referral is also a significant risk factor for mortality [21]. Hence, the outcomes of our predictive models were aggravated renal function from eGFR ≥ 60 mL/min/1.73 m 2 to (1) GFR < 30 mL/min/1.73 m 2 and (2) to <45 mL/min/1.73 m 2 . We selected adult T2DM patients with pair eGFR records of a 180-day period between the reference point and prediction target point. We first determined the target point for each individual patient and then went back to determine the reference point to select patients who fitted the criteria for the reference point. We labeled patients as being in the "referral" group if the eGFR was persistently lower than our outcomes (eGFR < 45 or <30 mL/min/1.73 m 2 ) at the target point and 90 days after the target point. We confirmed chronic kidney disease if the eGFR did not recover 90 days after the target point in the "referral" group. On the other hand, we labeled patients as being in the "non-referral" group if (1) the eGFR was persistently ≥ 30 mL/min/1.73 m 2 at the target point and 90 days after the target point or (2) the eGFR was persistently ≥ 45 mL/min/1.73 m 2 at the target point and 90 days after the target point. We further enrolled patients according to the criteria for the reference point as follows: (1) eGFR ≥ 60 mL/min/1.73 m 2 at the reference We selected adult T2DM patients with pair eGFR records of a 180-day period between the reference point and prediction target point. We first determined the target point for each individual patient and then went back to determine the reference point to select patients who fitted the criteria for the reference point. We labeled patients as being in the "referral" group if the eGFR was persistently lower than our outcomes (eGFR < 45 or <30 mL/min/1.73 m 2 ) at the target point and 90 days after the target point. We confirmed chronic kidney disease if the eGFR did not recover 90 days after the target point in the "referral" group. On the other hand, we labeled patients as being in the "non-referral" group if (1) the eGFR was persistently ≥ 30 mL/min/1.73 m 2 at the target point and 90 days after the target point or (2) the eGFR was persistently ≥ 45 mL/min/1.73 m 2 at the target point and 90 days after the target point. We further enrolled patients according to the criteria for the reference point as follows: (1) eGFR ≥ 60 mL/min/1.73 m 2 at the reference point, (2) 180-day average eGFR ≥ 60 mL/min/1.73 m 2 prior to the reference point, and (3) T2DM diagnosis before the reference point.

Data Preprocessing and Machine Learning Models
We discussed with the domain experts for outliers of laboratory features. We excluded outliers of laboratory features on the basis of medical knowledge, wherein the error values were obviously inconsistent with the actual situation. Patients in the non-referral group had a more stable condition than patients in the referral group, which resulted in less laboratory examinations among patients in the non-referral group. There were a few patients with more than 12 missing features in the referral group. We excluded patients with more than 12 missing features to deal with the missing data in the non-referral group and the imbalance of the data set. After that, features with more than 40% missing values were excluded, and the mean of this feature was used to interpolate the remaining missing data [22,23]. We chose the "last" and "average" values of each feature in the 180-day period before the reference point as input data (Figure 1). We treated our prediction of referral need as a binary classification problem.
The architecture of our prediction models is shown in Figure 2. The study cohort was divided into the following two parts: (1) the data from January 2008 to December 2019 as the training/validation data set, and (2) the data from January 2020 to June 2021 as the testing data set. Then, the training/validation data set was randomly divided, with 80% used for training and 20% for validation. We performed fivefold cross-validation within the training/validation data set to identify the optimal classifier [24][25][26][27][28]. The optimal classifier was then used to predict our outcome for each patient in the testing data set. The testing data set was independent of the training/validation data set. It provided an unbiased final model performance metric. point, (2) 180-day average eGFR ≥ 60 mL/min/1.73 m 2 prior to the reference point, and (3) T2DM diagnosis before the reference point.

Data Preprocessing and Machine Learning Models
We discussed with the domain experts for outliers of laboratory features. We excluded outliers of laboratory features on the basis of medical knowledge, wherein the error values were obviously inconsistent with the actual situation. Patients in the nonreferral group had a more stable condition than patients in the referral group, which resulted in less laboratory examinations among patients in the non-referral group. There were a few patients with more than 12 missing features in the referral group. We excluded patients with more than 12 missing features to deal with the missing data in the nonreferral group and the imbalance of the data set. After that, features with more than 40% missing values were excluded, and the mean of this feature was used to interpolate the remaining missing data [22,23]. We chose the "last" and "average" values of each feature in the 180-day period before the reference point as input data ( Figure 1). We treated our prediction of referral need as a binary classification problem.
The architecture of our prediction models is shown in Figure 2. The study cohort was divided into the following two parts: (1) the data from January 2008 to December 2019 as the training/validation data set, and (2) the data from January 2020 to June 2021 as the testing data set. Then, the training/validation data set was randomly divided, with 80% used for training and 20% for validation. We performed fivefold cross-validation within the training/validation data set to identify the optimal classifier [24][25][26][27][28]. The optimal classifier was then used to predict our outcome for each patient in the testing data set. The testing data set was independent of the training/validation data set. It provided an unbiased final model performance metric. We compared the performance of three classical machine learning algorithms: logistic regression (LR), random forest (RF), and extreme gradient boosting (XGBoost) to develop the predictive models. We further applied an ensemble approach using soft voting classifier to classify the referral group [29,30]. In the ensemble model, LR, RF, and XGBoost classifier were ensembled. We used the soft voting calculated on the predicted probability of the output class. All analyses were performed using Python (version 3.8) We compared the performance of three classical machine learning algorithms: logistic regression (LR), random forest (RF), and extreme gradient boosting (XGBoost) to develop the predictive models. We further applied an ensemble approach using soft voting classifier to classify the referral group [29,30]. In the ensemble model, LR, RF, and XGBoost classifier were ensembled. We used the soft voting calculated on the predicted probability of the output class. All analyses were performed using Python (version 3.8) [31]. We used the area under the receiver operating characteristic curve (AUROC), precision, recall, and accuracy as the metrics to evaluate the performance between different models. We also calculated the Shapley additive explanations (SHAP) values to evaluate the feature importance that explored the relationship between the outcome and the feature [32,33].
The assessment of normality was conducted using the Kolmogorov-Smirnov test. The continuous variables with normal distribution are shown as mean ± standard deviation, whereas the continuous variables with non-normal distribution are presented as the median (first quartile, third quartile). The categorical variables are reported as numbers (percentage). Tests for the statistical significance were conducted using the chi-squared test for categorical variables and the Mann-Whitney test for non-parametric continuous variables. The level of significance was set at p < 0.05. Statistical analyses were performed using MedCalc for Windows, version 20.210 (MedCalc Software, Ostend, Belgium).
Medical data are usually unbalanced. Because the imbalance of data were found, we performed a pilot experiment with a new target of outcome (persistent eGFR ≥ 60 mL/min/1.73 m 2 ) to find the optimal method for this problem (see Appendix A). We applied Downsample, the Synthetic Minority Oversampling Technique (SMOTE) algorithm, and Tomek Link [34] to cope with the imbalance of data [35,36]. However, the result (see Table A1 in Appendix A) showed no obvious improvement of performance. Finally, we input the original data set into our machine learning models without any of the abovementioned methods.

Results
A total of 19,892 adult T2DM patients were enrolled in "experiment 1" to predict the rapid renal function decline and nephrology referral when the eGFR was persistently lower than 30 mL/min/1.73 m 2 . Among these, there were 19,244 adult T2DM patients in the "non-referral" group and 648 adult T2DM patients in the "referral" group.
In addition, a total of 16,145 adult T2DM patients were enrolled in "experiment 2" to predict the rapid renal function decline and nephrology referral when the eGFR was persistently lower than 45 mL/min/1.73 m 2 . Among these, there were 15,159 adult T2DM patients in the "non-referral" group and 986 adult T2DM patients in the "referral" group.  Table 1 reveals the baseline demographic and clinical characteristics of the included patients in experiment 1. The age of the patients was significantly older in the referral group. Patients in the referral group had significantly more comorbidities, higher creatinine, higher BUN, higher HbA1c, lower HGB, lower albumin, higher hsCRP, higher uric acid, higher TG, higher UPCR, and higher UACR. The missing data for each variable in the experiment 1 are shown in Appendix B Table A2.  16 Values are expressed as median (interquartile range) or number (percentage). Non-normally distributed continuous variables were compared using the Mann-Whitney test. Categorical variables were compared using the chi-squared test. The p-value represents the comparison between the referral group and the non-referral group. CAD, coronary arterial disease; PAD, peripheral arterial disease; CHF, congestive heart failure; AKI, acute kidney injury; EV, esophageal varices; BUN, blood urea nitrogen; HbA1c, glycated hemoglobin; HGB, hemoglobin; HCT, hematocrit; AST, aspartate amino transferase; ALT, alanine transaminase; CPK, creatine phosphokinase; hsCRP, high-sensitivity C-reactive protein; K, serum potassium; RBC, red blood count; WBC, white blood count; Bil-T, total bilirubin; CHO, total cholesterol; LDL, low-density lipoprotein; TG, triglyceride; UPCR, spot urine protein to creatinine ratio; UACR, spot urine albumin to creatinine ratio. Table 2 demonstrates the three models to predict rapidly progressive kidney disease and nephrology referral when the eGFR was persistently < 30 mL/min/1.73 m 2 . All three models achieved an accuracy of more than 0.91 and an AUROC of more than 0.96. The XGB model had higher accuracy and relatively higher precision in the referral group as compared with the LR and RF models. However, LR and RF models had higher recall in the referral group. In general, the ensemble voting classifier had relatively higher accuracy, higher AUROC, and higher recall in the referral group as compared with the other three models.  Figure 3 shows the confusion matrix and predictive probabilities of the XGBoost model in experiment 1. The plot of the predictive probabilities in Figure 3 revealed this model could distinguish the "referral" from the "non-referral" group in both the training/validation data set ( Figure 3A) and testing data set ( Figure 3B).  Figure 3 shows the confusion matrix and predictive probabilities of the XGBoost model in experiment 1. The plot of the predictive probabilities in Figure 3 revealed this model could distinguish the "referral" from the "non-referral" group in both the training/validation data set ( Figure 3A) and testing data set ( Figure 3B).      Table 3 shows the baseline demographic and clinical characteristics of the includ

Experiment 2: Predict Rapidly Progressive Kidney Disease and Nephrology
Referral When the eGFR Was Persistently Lower than 45 mL/min/1.73 m 2 Table 3 shows the baseline demographic and clinical characteristics of the included patients in experiment 2. The age of the patients was significantly older in the referral group. Patients in the referral group had significantly more comorbidities, higher creatinine, higher BUN, lower HGB, lower albumin, higher hsCRP, higher uric acid, higher TG, higher UPCR, and higher UACR. The missing data for each variable in experiment 2 is shown in Appendix B Table A3.  16 Values are expressed as median (interquartile range) or number (percentage). Non-normally distributed continuous variables were compared using the Mann-Whitney test. Categorical variables were compared using the chi-squared test. The p-value represents the comparison between the referral group and the non-referral group. CAD, coronary arterial disease; PAD, peripheral arterial disease; CHF, congestive heart failure; AKI, acute kidney injury; EV, esophageal varices; BUN, blood urea nitrogen; HbA1c, glycated hemoglobin; HGB, hemoglobin; HCT, hematocrit; AST, aspartate amino transferase; ALT, alanine transaminase; CPK, creatine phosphokinase; hsCRP, high-sensitivity C-reactive protein; K, serum potassium; RBC, red blood count; WBC, white blood count; Bil-T, total bilirubin; CHO, total cholesterol; LDL, low-density lipoprotein; TG, triglyceride; UPCR, spot urine protein to creatinine ratio; UACR, spot urine albumin to creatinine ratio. Table 4 reveals the three models to predict rapidly progressive kidney disease and nephrology referral when the eGFR was persistently < 45 mL/min/1.73 m 2 . All three models achieved an accuracy of more than 0.88 and an AUROC more than 0.93. The XGB model had higher accuracy and relatively higher precision in the referral group as compared with the LR and RF models. However, LR and RF models had higher recall in the referral group. In general, the ensemble voting classifier had relatively higher accuracy, higher AUROC, and higher recall in the referral group as compared with the other models. Table 4. Performance metrics for the three models to predict rapidly progressive kidney disease and nephrology referral when the eGFR was persistently < 45 mL/min/1.73 m 2 .  Figure 5 shows the confusion matrix and predictive probabilities of the XGBoost model in experiment 2. The plot of the predictive probabilities of Figure 5 revealed this model could distinguish the "referral" from the "non-referral" group in both the training/validation data set ( Figure 5A) and the testing data set ( Figure 5B). model could distinguish the "referral" from the "non-referral" group in both the training/validation data set ( Figure 5A) and the testing data set ( Figure 5B). . The green in the histogram represents the referral group, and the medium slate blue represents the non-referral group. Figure 6 demonstrates the SHAP summary plot of the top 15 features for the XGBoost model in experiment 2. The first three features were the same in both experiment 1 and experiment 2, and the importance of proteinuria increased in experiment 2 (eGFR was persistently < 45 mL/min/1.73 m 2 ) as compared with experiment 1 (eGFR was persistently < 30 mL/min/1.73 m 2 ). Proteinuria (UPCR or UACR) is also an important predictor for the Figure 5. Confusion matrix and predictive probabilities histogram of the XGBoost model in experiment 2 (persistent eGFR < 45 mL/min/1.73 m 2 ). The green in the histogram represents the referral group, and the medium slate blue represents the non-referral group. Figure 6 demonstrates the SHAP summary plot of the top 15 features for the XGBoost model in experiment 2. The first three features were the same in both experiment 1 and experiment 2, and the importance of proteinuria increased in experiment 2 (eGFR was persistently < 45 mL/min/1.73 m 2 ) as compared with experiment 1 (eGFR was persistently < 30 mL/min/1.73 m 2 ). Proteinuria (UPCR or UACR) is also an important predictor for the risk of rapidly progressive kidney disease and the need for nephrology referral. Figure 5. Confusion matrix and predictive probabilities histogram of the XGBoost mo experiment 2 (persistent eGFR < 45 mL/min/1.73 m 2 ). The green in the histogram represe referral group, and the medium slate blue represents the non-referral group. Figure 6 demonstrates the SHAP summary plot of the top 15 features for the XG model in experiment 2. The first three features were the same in both experiment experiment 2, and the importance of proteinuria increased in experiment 2 (eGF persistently < 45 mL/min/1.73 m 2 ) as compared with experiment 1 (eGFR was persis < 30 mL/min/1.73 m 2 ). Proteinuria (UPCR or UACR) is also an important predictor f risk of rapidly progressive kidney disease and the need for nephrology referral.

Additional Experiment with Loose Inclusion and Labeling Criteria for Both Experiments 1 and 2
We conducted an additional experiment with loose inclusion and labeling criteria for both experiments 1 and 2. In this additional experiment, we included T2DM patients with one laboratory result showing an eGFR ≥ 60 mL/min/1.73 m 2 at the reference point and a T2DM diagnosis before the reference point. We also labeled patients with only one laboratory result, showing an eGFR < 30 mL/min/1.73 m 2 for experiment 1 and an eGFR < 45 mL/min/1.73 m 2 for experiment 2 in this additional experiment. We did not confirm patients with a 180-day average eGFR ≥ 60 mL/min/1.73 m 2 prior reference point and a persistently lower eGFR 90 days after the target point in this additional experiment. Table 5 reveals that the accuracy and AUROC decreased in all of the three ML models for the additional experiment with loose inclusion and labeling criteria.

Discussion
Due to the heterogeneous phenotype of type 2 diabetic renal disease, the optimal time for the nephrology referral of T2DM patients is still challenging [6]. The American Diabetes Association (ADA) recommends that (1) diabetes patients should be referred for evaluation for RRT if they have an eGFR < 30 mL/min/1.73 m 2 , and (2) diabetes patients should be referred to a physician experienced in the care of kidney disease for uncertainty about the etiology of kidney disease, difficult management issues, and rapidly progressive kidney disease [19]. However, Pinier et al. [21] performed a retrospective survival analysis in DM patients in a 13-year period, and the study showed that both an eGFR < 30 mL/min/1.73 m 2 and <45 mL/min/1.73 m 2 at the time of referral were powerful risk factors for mortality. Therefore, we performed one experiment with a predictive target of an eGFR < 30 mL/min/1.73 m 2 and another one with a predictive target of eGFR < 45 mL/min/1.73 m 2 . In addition, our study design also predicted rapidly progressive kidney disease in a 6-month period. This is an important indication for nephrology referral in T2DM as well.
Few studies have focused on the prediction of diabetic nephropathy and renal function decline [14][15][16][17][18]. Makino et al. [17] constructed a logistic regression ML learning model based on big data from the electronic medical records (EMR) of diabetes patients. Their logistic regression model had 3073 features with time series data. The accuracy of their logistic regression ML model to predict DKD aggravation was 0.71. Dong et al. [15] built up a 3-year DKD risk predictive model in patients with T2DM and normo-albuminuria, and their study showed the LightGBM model was the best model with an area under curve (AUC) of 0.815. Owing to the different study design and predictive target, all models in our study achieved an accuracy of more than 0.88 and an AUROC more than 0.93. Our study mainly focused on T2DM patients with rapidly progressive kidney disease in the 6-month period, as this condition is an important indication for nephrology referral. Early detection of this condition is a key to improving renal outcome and reducing complications. Additionally, our study design confirmed the target condition with persistent renal function impairment 90 days after the target point. The more specific and strict definition of the predictive target could improve the model performance in our study (Table 5).
Our result showed that the XGB model had higher accuracy and relatively higher precision in the referral group as compared with the LR and RF models, but LR and RF models had higher recall in the referral group. The lower precision means that the model had more false alarms, and the false alarms may increase the clinical load of the nephrologist. However, the higher recall may be more important for patient safety because it means that less patients who need nephrology referral (adult T2DM patients with rapidly progressive kidney disease) are neglected. In general, the ensemble voting classifier had relatively higher accuracy, higher AUROC, and higher recall in the referral group as compared with the other three models.
Some potential limitations of this study should be acknowledged. First, the nature of the retrospective study may cause some unrecognized confounding factors to bias the findings. Second, we did not analyze the impact of medication in our study, and some medication may be associated with rapidly progressive kidney disease. Third, our study included a small sample size and was conducted at a single hospital. The majority of the population was Taiwanese. Fourth, the data set was highly unbalanced, despite our attempts to deal with this problem. Models trained on imbalanced data may cause the accuracy paradox. Precision and recall may be better metrics in such conditions. Fifth, we excluded patients with more than 12 missing features to deal with the missing data in the non-referral group and the imbalance of dataset, which may introduce bias in analysis. Sixth, only internal validation was performed in our study; external validation using a different data set is needed. Hence, further multicenter and multinational studies are required to confirm the stability of the performance of our predictive model.

Conclusions
In conclusion, we built a 6-month machine learning predictive model for the risk of rapidly progressive kidney disease and the need for nephrology referral in adult patients with T2DM and an initial eGFR ≥ 60 mL/min/1.73 m 2 . Our result showed that the XGB model had higher accuracy and relatively higher precision in the referral group as compared with the LR and RF models, but LR and RF models had higher recall in the referral group. In general, the ensemble voting classifier had relatively higher accuracy, higher AUROC, and higher recall in the referral group as compared with the other three models. Early detection of rapidly progressive kidney disease is key to improving the renal outcome and reducing complications in adult patients with T2DM.  Informed Consent Statement: Patient consent was waived due to the retrospective data analysis nature of this study.

Data Availability Statement: Not applicable.
Acknowledgments: The authors thank the Clinical Informatics Research and Development Center of Taichung Veterans General Hospital for the data extraction from the Taichung Veterans General Hospital Research Database. In addition, the authors also thank the DDS-THU AI Center of Tunghai University for the development of the machine learning predictive models.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
Because an imbalance of data was found, we performed a pilot experiment with a new target of outcome (persistent eGFR ≥ 60 mL/min/1.73 m 2 ) to find the optimal method for this problem. We applied Downsample, the Synthetic Minority Oversampling Technique (SMOTE) algorithm, and Tomek Link [34] to cope with the imbalance of data [35,36]. Table A1 shows the performance metrics for the original and three other methods to deal with the imbalance of data. The result revealed no obvious improvement in performance. Hence, we input the original data set into our machine learning models without any of the abovementioned methods.

Appendix B
Tables A2 and A3 demonstrate the missing data for each variable in experiment 1 and experiment 2, respectively.